99 research outputs found

    Are ambiguous conjunctions problematic for machine translation?

    Get PDF
    The translation of ambiguous words still poses challenges for machine translation. In this work, we carry out a systematic quantitative analysis regarding the ability of different machine translation systems to disambiguate the source language conjunctions “but” and “and”. We evaluate specialised test sets focused on the translation of these two conjunctions. The test sets contain source languages that do not distinguish different variants of the given conjunction, whereas the target languages do. In total, we evaluate the conjunction “but” on 20 translation outputs, and the conjunction “and” on 10. All machine translation systems almost perfectly recognise one variant of the target conjunction, especially for the source conjunction “but”. The other target variant, however, represents a challenge for machine translation systems, with accuracy varying from 50% to 95% for “but” and from 20% to 57% for “and”. The major error for all systems is replacing the correct target variant with the opposite one

    On context span needed for machine translation evaluation

    Get PDF
    Despite increasing efforts to improve evaluation of machine translation (MT) by going beyond the sentence level to the document level, the definition of what exactly constitutes a ``document level'' is still not clear. This work deals with the context span necessary for a more reliable MT evaluation. We report results from a series of surveys involving three domains and 18 target languages designed to identify the necessary context span as well as issues related to it. Our findings indicate that, despite the fact that some issues and spans are strongly dependent on domain and on the target language, a number of common patterns can be observed so that general guidelines for context-aware MT evaluation can be drawn

    On the same page? Comparing inter-annotator agreement in sentence and document level human machine translation evaluation

    Get PDF
    Document-level evaluation of machine translation has raised interest in the community especially since responses to the claims of “human parity” (Toral et al., 2018; L¨aubli et al.,2018) with document-level human evaluations have been published. Yet, little is known about best practices regarding human evaluation of machine translation at the documentlevel. This paper presents a comparison of the differences in inter-annotator agreement between quality assessments using sentence and document-level set-ups. We report results of the agreement between professional translators for fluency and adequacy scales, error annotation, and pair-wise ranking, along with the effort needed to perform the different tasks. To best of our knowledge, this is the first study of its kind

    Evaluating the impact of light post-editing on usability

    Get PDF
    This paper discusses a methodology to measure the usability of machine translated content by end users, comparing lightly post-edited content with raw output and with the usability of source language content. The content selected consists of Online Help articles from a software company for a spreadsheet application, translated from English into German. Three groups of five users each used either the source text - the English version (EN) -, the raw MT version (DE_MT), or the light PE version (DE_PE), and were asked to carry out six tasks. Usability was measured using an eye tracker and cognitive, temporal and pragmatic measures of usability. Satisfaction was measured via a post-task questionnaire presented after the participants had completed the tasks

    Acceptability of machine-translated content: a multi-language evaluation by translators and end-users

    Get PDF
    As machine translation (MT) continues to be used increasingly in the translation industry, there is a corresponding increase in the need to understand MT quality and, in particular, its impact on endusers. To date, little work has been carried out to investigate the acceptability of MT output among end-users and, ultimately, how acceptable they find it. This article reports on research conducted to address that gap. End-users of instructional content machine-translated from English into German, Simplified Chinese and Japanese were engaged in a usability experiment. Part of this experiment involved giving feedback on the acceptability of raw machine-translated content and lightly postedited (PE) versions of the same content. In addition, a quality review was carried out in collaboration with an industry partner and experienced translation quality reviewers. The translation qualityassessment (TQA) results from translators reflect the usability and satisfaction results by end-users insofar as the implementation of light PE both increased the usability and acceptability of the PE instructions and led to satisfaction being reported. Nonetheless, the raw MT content also received good scores, especially for terminology, country standards and spelling

    A human evaluation of English-Irish statistical and neural machine translation

    Get PDF
    With official status in both Ireland and the EU, there is a need for high-quality English-Irish (EN-GA) machine translation (MT) systems which are suitable for use in a professional translation environment. While we have seen recent research on improving both statistical MT and neural MT for the EN-GA pair, the results of such systems have always been reported using automatic evaluation metrics. This paper provides the first human evaluation study of EN-GA MT using professional translators and in-domain (public administration) data for a more accurate depiction of the translation quality available via MT

    Reading comprehension of machine translation output: what makes for a better read?

    Get PDF
    This paper reports on a pilot experiment that compares two different machine translation (MT) paradigms in reading comprehension tests. To explore a suitable methodology, we set up a pilot experiment with a group of six users (with English, Spanish and Simplified Chinese languages) using an English Language Testing System (IELTS), and an eye-tracker. The users were asked to read three texts in their native language: either the original English text (for the English speakers) or the machine-translated text (for the Spanish and Simplified Chinese speakers). The original texts were machine-translated via two MT systems: neural (NMT) and statistical (SMT). The users were also asked to rank satisfaction statements on a 3-point scale after reading each text and answering the respective comprehension questions. After all tasks were completed, a post-task retrospective interview took place to gather qualitative data. The findings suggest that the users from the target languages completed more tasks in less time with a higher level of satisfaction when using translations from the NMT system

    Translation dictation vs. post-editing with cloud-based voice recognition: a pilot experiment

    Get PDF
    In this paper, we report on a pilot mixed-methods experiment investigating the effects on productivity and on the translator experience of integrating machine translation (MT) postediting (PE) with voice recognition (VR) and translation dictation (TD). The experiment was performed with a sample of native Spanish participants. In the quantitative phase of the experiment, they performed four tasks under four different conditions, namely (1) conventional TD; (2) PE in dictation mode; (3) TD with VR; and (4) PE with VR (PEVR). In the follow-on qualitative phase, the participants filled out an online survey, providing details of their perceptions of the task and of PEVR in general. Our results suggest that PEVR may be a usable way to add MT to a translation workflow, with some caveats. When asked about their experience with the tasks, our participants preferred translation without the ‘constraint’ of MT, though the quantitative results show that PE tasks were generally more efficient. This paper provides a brief overview of past work exploring VR for from-scratch translation and PE purposes, describes our pilot experiment in detail, presents an overview and analysis of the data collected, and outlines avenues for future work

    Document-level machine translation evaluation project: methodology, effort and inter-annotator agreement

    Get PDF
    Recently, document-level (doc-level) human evaluation of machine translation (MT) has raised interest in the community after a few attempts have disproved claims of “human parity” (Toral et al., 2018; Laubli et al., 2018). However, lit- ¨ tle is still known about best practices regarding doc-level human evaluation. This project aims to identify methodologies to better cope with i) the current state-of-theart (SOTA) human metrics, ii) a possible complexity when assigning a single score to a text consisted of ‘good’ and ‘bad’ sentences, iii) a possible tiredness bias in doc-level set-ups, and iv) the difference in inter-annotator agreement (IAA) between sentence and doc-level set-ups

    How much context span is enough? Examining context-related issues for document-level MT

    Get PDF
    This paper analyses how much context span is necessary to solve different context-related issues, namely, reference, ellipsis, gender, number, lexical ambiguity, and terminology when translating from English into Portuguese. We use the DELA corpus, which consists of 60 documents and six different domains (subtitles, literary, news, reviews, medical, and legislation). We find that the shortest context span to disambiguate issues can appear in different positions in the document including preceding, following, global, world knowledge; and that the average length depends on the issue types as well as the domain. Additionally, we show that the standard approach of relying on only two preceding sentences as context might not be enough depending on the domain and issue types
    • …
    corecore